Skip to content

Dataops user guide intro#2162

Merged
rcap107 merged 9 commits into
skrub-data:mainfrom
jeromedockes:dataops-user-guide
Jun 17, 2026
Merged

Dataops user guide intro#2162
rcap107 merged 9 commits into
skrub-data:mainfrom
jeromedockes:dataops-user-guide

Conversation

@jeromedockes

Copy link
Copy Markdown
Member

rewording a bit the intro section. @rcap107

@jeromedockes jeromedockes added documentation Add or improve the documentation data_ops Something related to the skrub DataOps labels Jun 12, 2026
@rcap107 rcap107 added this to the Release 0.10 milestone Jun 12, 2026
Comment thread doc/data_ops.rst
to help predict the product's category? What learning rate to set on a
:class:`~sklearn.ensemble.HistGradientBoostingRegressor`?

**Validation**  Finally, the quality of predictions must be evaluated on

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the section on leakage should be moved further up.

I also think there should be a mention of leakage at the very start, because it's really important and it may come a bit late (even though it's not that far down the page)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I added a (bold font) mention of data leakage at the very start. for the paragraphs that follow I think the chronological order of when you meet problems is roughly this one (building a pipeline at all, making modelling choices, validation) but that is indeed debatable

@rcap107 rcap107 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the short snippet helps a lot, thanks @jeromedockes

@rcap107 rcap107 merged commit c40c6a6 into skrub-data:main Jun 17, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data_ops Something related to the skrub DataOps documentation Add or improve the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants